Extracting Semantic Representations from Large Text Corpora
نویسندگان
چکیده
Many connectionist language processing models have now reached a level of detail at which more realistic representations of semantics are required. In this paper we discuss the extraction of semantic representations from the word co-occurrence statistics of large text corpora and present a preliminary investigation into the validation and optimisation of such representations. We find that there is significantly more variation across the extraction procedures and evaluation criteria than is commonly assumed.
منابع مشابه
Measuring Word Significance using Distributed Representations of Words
Distributed representations of words as real-valued vectors in a relatively lowdimensional space aim at extracting syntactic and semantic features from large text corpora. A recently introduced neural network, named word2vec (Mikolov et al., 2013a; Mikolov et al., 2013b), was shown to encode semantic information in the direction of the word vectors. In this brief report, it is proposed to use t...
متن کاملComputational Methods to Extract Meaning From Text and Advance Theories of Human Cognition
Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researc...
متن کاملSemantic Typology and Parallel Corpora: Something about Indefinite Pronouns
Patterns of crosslinguistic variation in the expression of word meaning are informative about semantic organization, but most methods to study this are labor intensive and obscure the gradient nature of concepts. We propose an automatic method for extracting crosslinguistic co-categorization patterns from parallel texts, and explore the properties of the data as a potential source for automatic...
متن کاملAutomatic Knowledge Acquisition by Semantic Analysis and Assimilation of Textual Information
Automatic knowledge acquisition is one of the bottlenecks in artificial intelligence and large-scale applications of natural language processing (NLP). There are many efforts to create large knowledge bases (KBs) or to automatically derive knowledge from large text corpora. On the one hand, we meet KBs like CYC, where a tremendous amount of work has been invested by knowledge enterers who have ...
متن کاملAutomatic extraction of property norm-like data from large text corpora
Traditional methods for deriving property-based representations of concepts from text have focused on either extracting only a subset of possible relation types, such as hyponymy/hypernymy (e.g., car is-a vehicle) or meronymy/metonymy (e.g., car has wheels), or unspecified relations (e.g., car--petrol). We propose a system for the challenging task of automatic, large-scale acquisition of uncons...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997